Proportionate gradient updates with PercentDelta
نویسنده
چکیده
Deep Neural Networks are generally trained using iterative gradient updates. Magnitudes of gradients are affected by many factors, including choice of activation functions and initialization. More importantly, gradient magnitudes can greatly differ across layers, with some layers receiving much smaller gradients than others. causing some layers to train slower than others and therefore slowing down the overall convergence. We analytically explain this disproportionality. Then we propose to explicitly train all layers at the same speed, by scaling the gradient w.r.t. every trainable tensor to be proportional to its current value. In particular, at every batch, we want to update all trainable tensors, such that the relative change of the L1-norm of the tensors is the same, across all layers of the network, throughout training time. Experiments on MNIST show that our method appropriately scales gradients, such that the relative change in trainable tensors is approximately equal across layers. In addition, measuring the test accuracy with training time, shows that our method trains faster than other methods, giving higher test accuracy given same budget of training steps.
منابع مشابه
Impact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method
Objective(s): The latest single-photon emission computed tomography (SPECT)/computed tomography (CT) reconstruction system, referred to as xSPECT Bone™, is a context-specific reconstruction system utilizing tissue segmentation information from CT data, which is called a zone map. The aim of this study was to evaluate theeffects of zone-map enhancement incorporated into the ordered-subset conjug...
متن کاملA SET−MEMBERSHIP APPROACH TO NORMALIZED PROPORTIONATE ADAPTATION ALGORITHMS (TueAmOR12)
Proportionate adaptive filters can improve the convergence speed for the identification of sparse systems as compared to their conventional counterparts. In this paper, the idea of proportionate adaptation is combined with the framework of set−membership filtering (SMF) in an attempt to derive novel computationally efficient algorithms. The resulting algorithms attain an attractive faster conve...
متن کاملNew Proportionate Affine Projection Algorithm
A new proportionate-type affine projection algorithm with intermittent update of the weight coefficients is proposed. It takes into account the “history” of the proportionate factors and uses a fast recursive filtering procedure. Also, the effect of using dichotomous coordinate descent iterations is investigated. Simulation results indicate that the proposed algorithm has improved performance a...
متن کاملSparse Communication for Distributed Gradient Descent
We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and a...
متن کاملTowards health equity: a framework for the application of proportionate universalism
INTRODUCTION The finding that there is a social gradient in health has prompted considerable interest in public health circles. Recent influential works describing health inequities and their causes do not always argue cogently for a policy framework that would drive the most appropriate solutions differentially across the social gradient This paper aims to develop a practice heuristic for prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1708.07227 شماره
صفحات -
تاریخ انتشار 2017